Efficient and Scalable Sequence-Based XML Filtering
نویسندگان
چکیده
The ubiquitous adoption of XML as the standard of data exchange over the web has led to increased interest in building efficient and scalable XML publish-subscribe (pub-sub) systems. The central function of an XML-based pub-sub system is to perform XML filtering efficiently, i.e. identify those XPath expressions that have a match in a streaming XML document. In this paper, we propose a new sequence-based approach, which transforms both XML documents and XPath twig expressions into Node Encoded Tree Sequences (NETS). In terms of this encoding, we provide a necessary and sufficient condition for an XPath twig to represent a match in a given XML document. The proposed filtering procedure is based on a new subsequence matching algorithm devised for NETS, which identifies the set of matched queries free of false positives with a single scan of the XML document. Extensive experimental results show that the NETS method outperforms previous XML filtering approaches.
منابع مشابه
Efficient Filtering and Routing in a Scalable XML-Based Publish-Subscribe System
This paper introduces YAK – a scalable contentbased publish-subscribe system. YAK employs XML documents and expressive XPath queries as the publication and subscription model. To achieve high scalability, it combines the advantages of content routing in existing publish-subscribe systems and the efficient query indexing technique in the context of XML filtering. The filtering and routing strate...
متن کاملXML Filtering Using Dynamic Hierarchical Clustering of User Profiles
Information filtering systems constitute a critical component in modern information seeking applications. As the number of users grows and the information available becomes even bigger it is crucial to employ scalable and efficient representation and filtering techniques. In this paper we propose an innovative XML filtering system that utilizes clustering of user profiles in order to reduce the...
متن کاملXFIS: an XML filtering system based on string representation and matching
Information-filtering systems constitute a critical component of modern information-seeking applications. As the number of users grows and the amount of information available becomes even bigger, it is imperative to employ scalable and efficient representation and filtering techniques. Typically, the use of eXtensible Markup Language (XML) representation entails profile representation with the ...
متن کاملYFilter: Efficient and Scalable Filtering of XML Documents
Soon, much of the data exchanged over the Internet will be encoded in XML, allowing for sophisticated filtering and content-based routing. We have built a filtering engine called YFilter, which filters streaming XML documents according to XQuery or XPath queries that involve both path expressions and predicates. Unlike previous work, YFilter uses a novel NFA-based execution model. In this demon...
متن کاملValue-based predicate filtering of XML documents
In recent years, publish–subscribe systems based on XML filtering have received much attention in ubiquitous computing environments and Internet applications. The main challenge is to process a large number of content against millions of user subscriptions. Several XML filtering systems focus on the efficient processing of structural matching of user subscriptions represented as XPath twig patt...
متن کامل